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About Firaxis 

• Founded in 1996 

• Strategy games! 

• Sid Meier lead designer 

• 20+ shipped games 

• Civilization V 

• XCOM: Enemy Unknown 



FI RAXIS 


GAMES 




"Games that stand the test of time" 
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About Me 

• I work on the Civilization team 

• Graphics programmer 

• Over 7 years at Firaxis 

• Procedural modeling 

• Terrain rendering 
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Civilization V 

• Shipped Sept. 2010 

• One of the first DX11 games 

• Variable-bitrate GPU texture 
decompression 

• Hardware tessellation 

• Two large expansions 

• Gods & Kings 

• Brave New World 


OLANO et al. Variable Bit Rate GPU 
Texture Decompression. In EGSR 2011 
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Civilization V 

• Low-res Heightmap 

• 64x64 per hex 

• Procedurally generated 

• Unique - no repeat 

• High-res Materials 

• 512x512 per hex 

• Artist-created 

• Repeats across the world 
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Better Terrain 

• Problem: Sharp features 

• Low-res heightmap cannot display 
unique, high-res detail 

• Solution: High-res heightmap 

• More data (Compression? Streaming?) 

• Efficient Tessellation 


GPU Displacement Tessellation 
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Demo 

Simple procedural terrain... 

• Ridges to test difficult case 

• Assume strategy game camera (lots of pan/zoom) 

• High res: 256x256 Heightmap per tile 

• Large: 128x128 tiles ( 32,768x32,768 heightmap) 

...all done on the GPU 

• Heightmap/Normalmap created on demand 

• Use texture arrays to implement megatexture 

• Tessellation created on demand using GPU 
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Overview 

• Fixed Tessellation 

• Spoiler: Doesn't work well 

• Hardware Tessellation 

• Easy to implement 

• Better performance 

• Questionable quality 

• Variable Software Tessellation 

• Complex to implement 

• Great quality/performance balance 
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Fixed Tessellation 

• Pre-tessellate fixed-res mesh 

• Render same mesh for each cell 

• Displace in VS 

• High-res is slow 

• Lots of geometry (IA/Memory) 

• Tiny triangles (Quad utilization) 

• Low-res is ugly 

• Triangles do not match data 
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Hardware Tessellation 
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Hardware Tessellation 

• Continuously variable tessellation levels 

• Complex resampling of displacement map 

• Blurring - high frequency data disappears 

• Aliasing - "Sliding" or "Shifting" artifacts 

• Power-of-two tessellation levels 

• Much easier sampling of displacement map 

• Hard to change tessellation level without "popping" 
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View-Based Hull Shaders 

• Use camera information to set tessellation level 

• Distance from camera 

• Height of camera (Civ V) best for strategy games 

• Projected screen size 

• Silhouette enhancement 

• ...and variations 
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View-Based Hull Shaders 


Threshold of 
perception 

for triangle 
density shift 


Smooth data 




Civilization V 


" Tiny triangle " 


Equal sample/triangle 
density 


o 


Quality/Performance gap 
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Quad covers lxl height samples 
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Data-Based Hull Shaders 



Smooth Patch 


Sharp Patch 
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Data-Based Hull Shaders 

• Does quad (0,0)x(l,l) contribute to the final image? 

We can easily run this test at power-of-two resolutions 

At level N skip 2 N samples 

Increase threshold at each resolution (Demo: Multiply by 1.7) 


Large delta is over 
threshold, does 
contribute 


( 2 , 0 ) 

Small delta is under 

threshold, does not 
contribute 


(°.'°> , - (3,0) 
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Data-Based Hull Shaders 


• Build MIP hierarchy of 'necessary' quads 

• Run compute kernel across each level 

• Results in tessellation level for patch 

Since we limited ourselves to pow2 tessellation 

Level 0 Level 1 


Kernel: 

if lower level quad marked, 

output lower level 

else if this quad passes test 
output this level 
else 

output nothing 
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Data-Based 


Hull Shaders 


• In demo... 

• Higher resolution 

•Cell size is 256x256 
• 16x16 patches per cell (fastest) 

• Cache tessellation levels 

•Compute when tile becomes visible 

•Large cache texture stores all tessellation levels 

• Use Compute Shaders... 

•To generate the level heirarchy 

•To copy highest level into cache texture 

• Use Hull Shader... 

•To lookup tessellation level for patch 
•To match tessellation with neighbors 
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Data-Based Hull Shaders 


Pros 

• Looking at the heightmap was key 

• Many fewer tiny triangles generated 

• High quality (no compromise) 

Cons 

• Need to compute+store tess levels 

• Does not match data closely 

•Patch positions are fixed 
•Patch dicing pattern fixed 
•Still many tiny triangles 

Can we find a better solution for our 
use case? 
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Software Tessellation 

• Inspiration: AdaptiveTessellationCS40 

• D3D11 DirectCompute sample from Microsoft 

• Simulate hardware tessellation in software 

• Run in D3D11 Downlevel 10.0 

• Goal: Increase the reach of D3D1 1-style tessellation 

• Why not design a new tessellation algorithm? 

• Custom-built for detailed terrain rendering 

• Custom-build for strategy games 

• Run in compute shaders 
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Software Tessellation 


• Design goals: 

• Avoid tiny triangles 

• High quality 

• Efficien cy (for real- tim e ) 

• Our solution: 

• Simplify patch definition 

• Generate more patches 

• Data-based patch generation 

• Data-based patch dicing 
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Software Tessellation 


• Simplify patch definition 

_ , , _ , - Adjacent patches must be within 

. Only support P0W2 patches one tessellation level 

• No tessellation factors for center i 

• Edge tessellation factors 0 or 1 "" ' !!T 

• Patch defined by uint4 " ]|T 

[ Position , Level, Dicing pattern] ~ 


S0IS1 



Only 16 possible patterns! 
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Software Tessellation 


Kerne[ 1 : 

if lower level quad marked, 

output lower level 

else if lower level neighbor marked 

output this level 

else if this quad passes test 
output this level 


Build Tess MIP hierarchy 

• Entire tile covered by patches 

• No overlapping patches 

• Adjacent patches within one level 


else 

output nothing 


Level 0 


Level 1 


0 

0 


— 

0 

0 







— 

> i 

) 

o 


0 

1 

1 

1 



Kernel 2: 

if any quad in group marked 
mark all quads in group 
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Software Tessellation 


• Output patches by looking at MIP structure 

• Position, level from location with MIP 

• Look at lower-level neighbors to determine dicing pattern 

• Append to patch list 

• Optimization: Break complex patches into component parts 
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Software Tessellation 

The direction we split quads is important 

H 
0 
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Extensions 


• In our demo... 


• Treat patch split direction as separate dicing pattern 

• Process patch list to determine best split direction 

Difference of normal (dot product) 
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Software Tessellation 


• How do we build geometry from patch list? 

Difficulty: Dicing patterns vary from 2 to 4 tris 

• Simple algorithm: Degenerate geometry 

• Output 9 verts and 12 indices per patch 

• Extra verts and degenerate triangles not optimal 

• We are only getting indexing within a patch 

• Fast enough to run every frame 
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Software Tessellation 

• How do we build better geometry from patch list? 

• AdaptiveTessellationCS40 

• Use prefix-sum to get base vertex/index ID for each patch 

• Tightly packed VB/IB 

• Slower, indexing within patch only 

• Tile Vertex ID table 

• Build table of all possible verts for an entire tile 

• Build verts that are referenced by any patch 

• Resolve vertex ID from table 

• Slowest, indexing across whole tile 
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GPU 


Resources 


Create Heightmap 


Create Normalmap 


Build Tess MIP 


Emit Quad List 










GAME DEVELOPERS CONFERENCE* 2014 



MARCH 17-21 y 2014 GDCONF.COM 


GPU 


Resources 


Create Heightmap 
Create Normalmap 
Build Tess MIP 



Height 


Emit Quad List 
Build VB/IB 


Vertex Displacement 

r 


Normal 



Shade 
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Software Tessellation 
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Software Tessellation 
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Software Tessellation 

• Performance Results 

. AMD A10 APU/8670D GPU 

• Final render performance 

• GPU processing time for frame, 

• Pros: Good performance, high quality 

• Cons: 

• MIP heirarchy more complex + larger 

• Need patch list for every visible tile 

Conclusion: Pixel shader execution dominates runtime , so it is worth doing 
extra work at the geometry level to generate efficient triangles. 


Resolution 

Hardware 

Software 

Speedup 

1600x1200 

6.673 

5.044 

24.41% 


GPU PerfStudio2 
ms 
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Implementation Tips 

• Compute shaders have pros and cons 

• Generally very fast, but can be slower than PS ( texture swizzle patterns ) 

• Can run asynchronously on some hardware 

• Atomic Operations vs. Atomic Counters 

• Atomic operations are general but slow 

• Atomic counters only increment or decrement... 

• ...but have hardware backing on some systems 

• Indirect draw/dispatch 

• Function parameters pulled from GPU buffer 

• Works well for draw calls ( Parameter is number of verts ) 

• Harder to use for dispatch ( Parameters are number of threadgroups) 
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Conclusion 

DXll: It's all about compute! 
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Questions? 



FI RAXIS 

GAMES 

We are hiring! 

www.firaxis.com 
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Extensions 

• Take advantage of flexible geometry generation 

• Create more than one VB based on pixel shader needed 

• Can be huge optimization! 


Steep slope 








